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ABSTRACT 

This  paper  is  an  analytic  study  of  a  mesh  partitioning 
method  for -storage  allocation  when  a  computer  system,  such  as 
ILLIAC  IV;  requires  that  partitions  (or  pages)  be  of  fixed  size.   The 
storage  allocation  method  (called  C-SKEW)  results  in  memory  assignment 
indices  which  constitute  additive  cyclic  groups  for  the  linear  struc- 
tures of  the  mesh.   A  storage  optimization  algorithm,  which  does  not 
significantly  contribute  complexity  to  the  intended  computation,  is 
presented.   The  algorithm  determines  the  partitioning  method  resiilting 
in  a  minimal  covering  for  a  simple  mesh  or  for  a  complex  mesh  consist- 
ing of  a  composition  of  simple  meshes  where  the  composition  itself  is 
determined.   The  paper  also  demonstrates  how  the  storage  method 
principles  are  applied  to  non-orthogonal  and  multi- dimensional  arrays. 
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1.   INTRODUCTION 

1.1  Scope 

Allocating  storage  for  data  arrays  so  that  access  is  facil- 
itated and  wasted  space  is  minimized  has  long  been  a  problem.   On 
ILLIAC  IV,  itself  an  array  computer  system,  where  each  processing 

element  PE  has  direct  access  only  to  its  nearest  four  neighbors  IE  , 
p  p_fl 

and  EE  n,   the  data  access  problem  is  magnified  and  storage  waste  is 
measured  not  only  in  terms  of  allocated  and  unused  memory  but  also 
allocated  and  unused  processing  power.  While  many  problems  can  be 
dimensioned  to  fit  (i.e.,  minimize  the  adversity  of  these  two  aspects) 
ILLIAC  rv,  it  is  often  the  case  that  they  cannot.   This  paper 
describes  an  array  partitioning  scheme  for  storage  allocation  so  that 
ILLIAC  IV  can  appear  dimensionally  to  fit  most  problems  involving  data 
arrays.  While  this  paper  specifically  discusses  ILLIAC  IV,  the 
storage  allocation  scheme  is  applicable  to  any  computing  system  using 
fixed  size  pages  or  partitions. 

The  storage  allocation  method,  called  C-SKEW,  consists  of 
interlocking  partitions;  they  are  of  fixed  size,  namely  n  elements, 
but  not  of  fixed  shape  so  that  row  and  column  indices  are  simple 
arithmetic  sequences.   Thus,  a  two-dimensional  partitioning  structure 
is  developed  to  cover  the  given  array,  minimally,  while  maintaining 
easy  access  to  linear  structures  of  rows  and  columns. 

A  storage  optimization  algorithm  is  presented  in  this  paper 
which  calculates  the  integer  value  of  C  (in  the  C-SKEW  name)  such 
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that;  given  a  rectangular  array  of  size  p  X  q.  and  the  computer  system 
memory  partition  of  size  n,  the  array  is  minimally  covered  in  the 
sense  that  a  minimum  amount  of  storage  is  wasted.   The  algorithm  is 
further  extended  to  calculate  the  optimal  covering  for  any  figure 
composed  of  many  rectangular  arrays;  in  fact  the  composition  itself  is 
calculated.   Finally,  it  is  demonstrated  that  the  given  array  need  not 
he  orthogonal  or  two-dimensional;  however  the  optimization  algorithm 
is  not  extended  to  these  cases. 

1.2  ILLIAC  IV 

The  array  computer  system  ILLIAC  IV  consists  of  n  processing 
elements  (FE) ,   where  n  e  [6k,   128,  256}  for  1,  2,   or  k   quadrant  opera- 
tion, which  are  capable  of  performing  arithmetic,  logic,  and  indexing 
operations.   Each  processing  element,  PE  ,  can  communicate  with  its 

four  nearest  neighbor  PE  .,  and  PE  n  and  each  PE  has  a  local  20^8 
^       p_fl       p_+o 

word  high-speed  memory.   All  processing  elements  are  issued  identical 
command  microsequences  and  operands,  by  name  or  value,  from  a  single 
control  unit.  When  the  control  unit  references  an  operand  by  name 
(e.g.,  address  cc) ,  each  processing  element  refers  to  word  OC   in  its  own 
memory  modiile;  the  collection  of  these  words  is  called  a  longword  [1]. 
ILLIAC  IV  is  a  powerful  computing  tool  which  is  specifically 
designed  to  perform  a  calculation  algorithm  f  on  n  sets  of  data  X.  si- 
multaneously, where  each  X.  is  a  set  of  r  parameters,  not  necessarily 
disjoint,  and  is  written 


A.  —  IX.-,,  X.^,  ...,  X.   j. 

1   ^  il   i2'       ir-" 
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A   typical  problem  to  "be  computed  on  ILLLAC  IV  has  a  problem  space 

X  =  iX-,  ^  X^}    '  • ' }   X  ] 

where  m  is  much  larger  than  n.   Assume  that  it  is  possible  to  parti- 
tion the  set  X  such  that  each  partition  P^  contains  n  parameter  sets. 
Let 


X  =  tP-L  U  Pg  U  P^  ...  U  P^  U  ...  U  P^,} 


in  "Which 


P.  =  [X^|(i-l)n  <  i  <  in). 

Let  M(x.,  )  be  the  memory  assignment  algorithm  for  the  k-th  parameter 
of  each  set  X.  in  the  partition  P..   Due  to  hardware  design  considera- 
tions of  ILLIAC  rv^  it  is  desirable  that  for  each  k  M(x.  )  be  a  simple 

arithmetic  function  of  i  alone.   When  M(x.,  )   =  CX   for  all  sets  X.  in 

^  ik  1 

the  partition  P.  then  it  is  said  that  OC   represents  a  longword  address. 
An  address  of  parameter  x.,  to  an  observer  external  to  ILLIAC  IV, 
however,  consists  of  the  pair  fM(x.,  ),  m(x   )]  where  m(x   )  =  £,   the 
PE  number. 


1 . 3  Array  Partitioning 

The  objective  of  any  partitioning  scheme  is  to  find  a  method 
of  subdividing  an  arbitrary  m-dimensional  array  or  mesh  into  some 
reasonably  efficient  and  facile  block  structure  while  preserving  the 
linear  array  structures  such  as  rows,  columns,  etc.   When  a  computer 
system's  data  segment  size  is  fixed  at,  say,  n  elements,  then  large 


k 

storage  inefficiencies  may  resiilt  if  n  does  not  divide  the  array 
dimension  values. 

As  an  example,  consider  a  p  x  q.  array  A   with  n  =  6kj   and 
p  =  q  =  81.   Let  partitions  "be  row-segments,  or  rectangles  of  size 
1  X  64.   Then  the  storage  scheme  results  in  8I  x  2  =  162  partitions 
where  8I  x  1  =  8I  are  full  and  contain  6k   elements,  and  8I  x  1  =  8I 
partitions  contain  17  elements  each.  If  column  segments  are  selected 
instead,  the  same  storage  requirement  results.  On  the  other  hand  if 
the  partitions  are  square  (or  residual  rectangles)  of  size  8x8  then 
the  storage  scheme  results  in  11  x  11  =  121  partitions,  where  100  par- 
titions are  fiill  and  contain  6k   elements  each,  20  contain  8  elements 
each,  and  1  contains  1  element.   This  is  an  improvement  (121  parti- 
tions versus  I62),  but  some  of  the  partitions  are  still  sparsely 
populated.   Suppose,  however,  that  rectangular  partitions  of  size 
9x7  are  used  (noting  that  9  divides  8I  and  that  9  X  7  <  6k).      Then 
a  covering  consisting  of  9  X  12  =  I08  partitions  results  where 
9  X  11  =  99  partitions  contain  63  elements  and  9x1=9  partitions 
contain  36   elements.   This  represents  a  considerable  improvement  over 
the  original  storage  scheme  where  I62  partitions  are  required;  in 
fact,  if  the  array  elements  are  taken  6k   at  a  time  without  regard  to 
partition  structure,  then  IO3  such  sets  result. 

Intuitively  it  appears  that  great  memory  savings  can  be  made 
if  the  right  partitioning  scheme  is  used.   On  ILLIAC  IV,  where  a 
processing  element  (EE)  is  assigned  to  each  array  element,  computer 
processing  power  is  conserved  when  the  array  partitioning  is  most 
efficient.  If  the  storage  allocation  procedure  preserves  the  linear 
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array  structures  of  roW;  column^  and  diagonal  then  it  shoiild  not  cost 
additional  processing  power  to  pay  for  the  memory  savings. 

Contrary  to  a  previously  stated  hypothesis ^  it  is  difficult 
to  partition  the  prohlem  space  X  into  disjoint  sets  since  in  many 
partial  differential  equation  and  matrix  problems  it  is  not  necessar- 
ily true  that  X.  H  X.  =0  for  i  ^  j.   That  is,  it  may  "be  the  case  that 
X.,  =  x.„  for  parameters  x_  e  X.  and  x..  e  X.. 

It  is  assumed  that  at  no  time  is  it  desirahle  to  have 
several  copies  of  the  same  parameter  in  memory.  Furthermore,  let  the 
single  copy  of  the  parameter  reside  naturally  in  one  and  only  one  set; 
let  the  natural  assigranent  be  x.,  e  X. .   In  this  case>  whenever  the 

IK     1 

i-th  parameter  of  set  X.  is  desired,  it  will  he  necessary  to  fetch 

J 

the  parameter  x.,  from  the  set  X..   Let  these,  and  all  such  identical, 
parameters  he  clustered  to  form  equivalence  classes.   These  classes 
are  actually  spaces,  called  data  arrays,  whose  dimensions  can  be 
determined  through  normal  application  of  the  principles  of  linear 
algebra.   Such  analysis  is  outside  the  scope  of  this  paper.   It  will 
be  assumed  instead  that  the  data  arrays  are  known  and  are  given  along 
with  the  calculation  algorithm  f  at  the  outset.  What  will  be  shown 
in  this  paper  is  how  the  partitions  P„  are  formed  from  the  parametric 
sets  X.  such  that  the  memory  allocation  of  equivalence  classes,  or 
data  arrays,  is  uniformly  and  uniquely  determined  within  and  without 
the  partitions. 


2.   C-SKEW  STORAGE 

2.1  C-SKEW  Description 

Assume  that  there  are  given  a  tvo- dimensional  array  A 
of  fixed  size  p  X  q.  and  a  fixed  integer  n  which  represents  the  maximal 
segment  size  on  the  computer  system  being  used.   In  the  case  of 
ILLIAC  rv,  n  €  {6k,   128,  256}.   Let  an  integer  c  be  selected  such  that 
c  <  n.  For  each  element  a.  .  e  A  let  there  "be  an  integer  m(a.  .) 
determined  by  the  memory  assignment  algorithm  such  that 


m(a. .)  =  (i  •  c  +  j)  mod  n.  "    (2.1.1) 

Then  C-SKEW  storage  is  a  partitioning  of  the  elements  a. .  e  A  into 

-'-J 

partitions  P  (k  =  1,  2,    3:    •••)  such  that 

(1)  the  partitions  are  exactly  c  columns  wide, 

(2)  the  integers  m(a.  .),  a.  .  e  P,  are  all  different,  and 

(3)  no  more  than  n  elements  are  included  in  a  partition  P,  . 
As  an  example,  consider  Figure  1  where  the  Integers  m(a,j) 

replace  their  associated  elements  a. .  and  the  partitioning  is  as 
indicated.   In  general  it  may  be  said  that  partitions  of  size  r  x  c 
are  formed  such  that 

(1)  re  is  maximally  less  than  or  equal  to  n,  and 

(2)  for  integers  w  and  h 

cw  is  minimally  greater  than  or  equal  to  q  and 
rh  is  minimally  greater  than  or  equal  to  p. 
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Figure  1.   C-SKEW  Storage  p=5,  q=12,  n=8,  c=3^  r=2 


If  the  partitions  represent  data  segments^  as  in  a  conven- 
tional sequential  machine^  then  the  integers  ni(a.  .)  are  relative 
addresses.   In  ILLIAC  IV^  however,  the  partitions  are  longwords  and 
the  integers  m(a. .)  are  in  one-to-one  correspondence  with  the  PE 
number  in  which  the  array  element  resides;  that  is,  a. .  resides  in 
PE  where  p  =  m(a. .). 


2.2  C-SKEW  Properties 

It  is  desirable  to  examine  the  specific  properties  of  the 

linear  structiores  (rows,  columns  and  diagonals)  which  are  consequences 

of  the  C-SKEW  storage  scheme.   This  can  best  be  done  by  analysis  of 

the  memory  assignment  algorithm  m(a. .)  in  equation  (2.1.1). 

-'-J 

It  is  obvious  that  rows  of  the  integers  m(a. .)  constitute  a 
cyclic  additive  group,  in  the  fonnaJ.  sense,  which  is  generated  by  the 
residue  class  1,  and  is  of  order  n  [2];  that  is,  the  group  G  =  (Z  ,+), 
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where  Z  is  the  set  of  residue  classes  modulo  n.  The  group  G  occurs 
n 

l(q  -I-  n  -  l)  4-  nj  times  in  each  rov,  where  q  is  the  numher  of  columns. 

Columns  of  the  integers  m(a. .)  constitute  either  a  cyclic 
additive  group  H^  or  a  coset  of  H  which  is  generated  by  the  element  c". 
If  H  is  a  group,  it  is  a  subgroup  of  G  and  is  therefore  also  cyclic; 
let  it  be  of  order  d.   Define  D(c,n)  to  be  the  greatest  common  divisor 
(GCD)  of  c  and  n.   Since  cyclic  G  is  of  order  n,  then  c  =  cl  is 
a  generator  of  G  if  and  only  if  D(c,n)  =  1  (i.e.,  if  c  and  n  are  rela- 
tively prime).  Furthermore,  if  ra  =  D(c,n)  then  the  order  of  H,  d,  is 
found  by  d  =  n/m.   When  d  =  n,  then  H  is  isomorphic  to  G,  where  the 
isomorphism  is  a  permutation.   The  group  H  or  one  of  its  cosets  occurs 
(p  +  d  -  l)  "^  dj  times  in  a  coliomn,  where  p  is  the  number  of  rows. 

Diagonals  of  the  integers  m(a. .)  constitute  either  a  cyclic 
additive  group  J  or  a  coset  of  J  which  is  generated  by  the  element 
cfl.   If  J  is  a  group,  it  is  a  subgroup  of  G.   The  comments  about  the 
order  of  H  also  apply  for  the  order  of  J.   On  IlilAC  IV  the  only  prime 
factor  of  n  is  "2";  therefore,  if  D(c,n)  =  1,  then  D(cfl,n)  ^  1  and 
vice  versa.  Consequently,  it  is  impossible  to  have  both  H  and  J 
isomorphic  to  G.  Whenever  the  prime  factorization  of  c  contains  only 
one  "2",  then  D(cfl,n)  =  1. 

Recall  that  an  attempt  is  being  made  to  find  an  efficient 
partitioning  scheme  that  allows  access  to  rows  and  columns,  in  groups 
of  order  n.   But  there  seems  to  be  a  dichotomy;  as  the  order  of  H  is 
maximized  it  is  necessary  that  m,  the  GCD  of  c  and  n,  be  minimized. 
This  immediately  implies  that  there  exists  no  integer  r  such  that 
r  •  c  =  n  and  H  isomorphic  to  G.  If  a  maximal  r  is  selected  such  that 


r  •  c  <  n,'  then  the  set  of  integers  mfa.  .)  contained  within  the 
rectangular  partition  of  size  r-rovs  by  c-columns,  do  not  constitute 
an  additive  group  since  the  closure  property  is  not  obeyed.   In  order 
to  maintain  closure ;  empty  elements  must  be  added  to  each  partition. 
This  is  precisely  what  is  implied  by  the  7x9  partitioning  scheme  in 
the  81  X  81  array  example  described  in  section  l.S*   Note  that  an 
application  of  the  C-SKEW  principles  as  defined  would  yield  the  order 
of  H  =  6ij-  since  'D{^,Gh)   =   1.   The  empty  element  in  each  partition 
would  in  general  be  different.  Figure  2  is  an  application  of  the 
three  C-SKEW  storage  principles  and  it  demonstrates  the  ideas 
discussed  in  this  paragraph. 

There  is,  however^  no  reason  to  insist  that  r  be  an  integer. 
Let  r  be  a  rational  number  such  that  r  •  c  =  n.   Then  in  order  to 
ensure  that  the  set  of  integers^  m(a.  .) ,    contained  within  the  r  x  c 
partition  constitute  an  additive  group,  the  partitions  must  be  poly- 
gons of  variable  shape.   Figure  3  demonstrates  the  partitioning  result 
when  r  is  rational;  other  dimensions  are  equivalent  to  those  used  in 
Figure  2.   The  partitioning  scheme  in  Figure  3  provides  a  larger 
covering  than  necessary  and  allows  the  value  of  p  to  grow  from  I5  to 
16  without  the  need  to  increase  the  number  of  partitions.   In  this 
case,  no  advantage  is  realized,  but  the  increased  storage  efficiency 
has  obvious  implications. 

Define  horizontal  strip  to  be  a  row  of  partitions  and 
vertical  strip  to  be  a  column  of  partitions.   Since  G  and  H  are 
cyclic,  ordered,  additive  groups  the  following  properties  of  the 
described  partitioning  scheme  hold: 
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(1)  The  partition  polygons  in  a  given  horizontal  strip  are 
identical  in  shape. 

(2)  In  a  given  vertical  strip^  the  set  of  integer  pairs 

(s  .  s   )  such  that  s  =  ni(a.  .)  e  P-,  and  s    = 
^  k'  k+-V^  k    ^  ij'^    k      k+w 

ni(a.  .)  €  P,    and  s  _,  s,    span  the  horizontal  boimdary 
is  the  same  for  all  horizontal  "boundaries. 

(3)  If  d  is  the  order  of  H,  then  the  residue  classes 

k  =  {i|k  =  i  mod  d  for  k  =  0,  1,  . . . ,  d  -  1}  describe 

identical  array  rows  of  integers  iii(a.  .)• 

-'-  J 

(k)     A   straight  horizontal  boundary  occurs  at  rows  in  the 
class  0  (modulo  d). 

(5)  If  m  =  D(c,n)^  there  are  precisely  m  -  1  distinct 
CO sets  of  the  group  H. 

(6)  The  union  of  H  and  all  distinct  cosets  of  H  is  G,  i.e., 
H  and  its  cosets  partition  G. 

It  is  instructive,  at  this  point,  to  examine  an  array  stored 

using  the  three  C-SKEW  principles  when  the  order  d  of  H  is  less  than 

n.   In  Figure  h  c  =  6   and  n  =  16,  so  m  =  2  and  d  =  8.   Therefore, 

groups  of  H  occur  in  columns  0  (modulo  2)  and  cosets  of  H  occur  in 

columns  1.   A  straight  horizontal  boundary  occurs  at  row  8.   In  this 

example  we  see  that  there  is  no  way  to  find  n  distinct  integers  m(a. .) 

-'-J 

in  a  column.  If,  however,  a  scheme  could  be  found  so  that  a  column 
would  contain  H  followed  by  each  of  its  cosets,  then  n  unique  integers 
m(a. .)  would  exist  in  a  contiguous  set  in  each  column.   If  an  extra 
skew  by  one  is  performed  at  rows  in  the  class  0  (modulo  d)  then  H  and 
its  cosets  occur  in  each  column;  however,  in  this  case,  the  sequence 
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of  integers  ni(a.  .)  is  not  a  simple  arithmetic  sequence  (i.e.,  property 
-'-J 

(2)  does  not  hold).  Figure  5  demonstrates  the  effect  of  the  extra 
skew  at  row  8  as  performed  on  the  array  in  Figure  h.      In  column  0 
(modulo  2)  group  H  is  followed  by  the  coset  1  +  H. 

2.3  Storage  Classification  and  Nomenclature 

In  section  2.2  the  rationale  for  the  various  storage  types 
(of  this  class)  is  discussed.  In  this  section  the  storage  types  are 
formally  defined.  Some  information  is  repeated  here  for  the  sake  of 
completion. 

Given  a  p  x  q.  array  A   and  an  integer  n  then: 
Definition  2.3'1:   C-SKEW  storage  is  a  contiguous  partitioning  of  the 
elements  a. .  e  A  such  that 

(1)  for  a  selected  integer  c  <  n,  the  partitions  P  are 
exactly  c  columns  wide; 

(2)  for  a  rational  number  r  satisfying  r  •  c  =  n  the 
partitions  are  r  rows  high;  and 

(3)  for  the  elements  a. .  €  P  the  integers  m(a. .)  = 
(i  •  c  f  j)  mod  n  are  all  different. 

It  is  clear  that  the  C-SKEW  storage  definition  yields  partitions  which 
are  polygons.   A  variation  of  C-SKEW  is  the  following: 
Definition  2.3.2:  C-CHECKER  storage  is  a  contiguous  partitioning  of 
the  elements  a. .  g  A  such  that 

(1)  for  a  selected  integer  c  <  n,  the  partitions  are 
exactly  c  columns  wide; 
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(2)  for  a  raaximum  integer  r  satisfying  r  •  c  <  n  the 
partitions  are  r  rows  high;  and 

(3)  for  all  elements  a.  .  €  P,  the  integers  m(a.  .)  = 

(i  •  c  f  j)  mod  n  are  all  different  and  the  set  of 
array  elements  assigned  to  the  relative  complement  of 
{m(a^^)  I  a^^  e  P^^}  is  empty. 

Hence  C-CHECKER  defines  rectangular  partitions  which  may  contain  less 

than  n  elements.   A  further  variation  of  C-SKEW  is: 

Definition  2 . 3 ■ 3 •   C-STRIPSKEW  storage  is  a  contiguous  partitioning 

of  the  elements  a. .  e  A  such  that 

(1)  for  a  selected  integer  c  <  n,  the  partitions ■ are 
exactly  c  columns  wide; 

(2)  for  a  rational  number  r  satisfying  r  •  c  =  n  the 
partitions  are  r  rows  high;  and 

(3)  for  each  a.  .  e  P,  the  integers  m(a.  .)  =  (i  •  c  +  j  f 
Ii4dj )  mod  n  are  all  different  where  d  is  the  order  of 
H  and  I -r-J  specifies  integer  division. 

Thus^  C-STRIPSKEW  provides  the  extra  skew  so  that  the  cyclic  group  H 
and  all  of  its  cosets  occur  in  each  column. 

2.k     C-SKEW  Assignment  Algorithm 

It  is  instructive  to  study  the  covering  of  a  p  x  q  array  "by 
the  C-SKEW  partitioning  scheme  as  illustrated  in  Figure  6.  The  array 
covering  is  w  vertical  strips  wide,  where  each  strip  is  of  width  c  and 
c  •  w  >  q.   The  i-th  vertical  strip  is  illustrated,  and  a  general 
longword  or  partition  vill  look  like  the  (tw  -i-  i)-th  partition 
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Figure  6.   C-SKEW  Storage,  Polygonal  Partitions 
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depicted  in  the  figure.  The  columnar  position  within  the  partition  is 
determined  hy  the  quantity  j(raod  c).   Note  that,  in  general,  the  upper 
and  lover  boundaries  of  a  horizontal  strip  are  jagged,  "but  for 
m  =  D(c,n)  the  boundary  at  row  n/m  =  d,  the  order  of  H,  is  straight. 

Let  the  array  declaration  A[i,  :|j., ,  £    :[i  ,    ...,  £    :\x  ] 
define  a  k-diraensional  array  where  £.:\i.   represents  the  lower  and 
upper  bounds  of  the  j-th  dimension.  If  i  *  =  i  .  -  i  .  and  |j.'.  =  [i .   - 

J       J       O  d       J 

i  .  -f  1  then  for  a  =  ATi^,  1^^  •••^  ii  ]^  an  element  of  the  array  A,  the 
longword  address  M(a)  (or  partition  number)  is  found  by 


M 


(a)  =  l(...  ((i^)  ^12-^12)  J^3  +  •••  +  ^k-2^  * 


ft 


K-i 

'U 

.    c    -t-  i '   mod  c 
k 

r 

1 

- 

n 

2. 

[^^  +  (c-1) 

■1- 

K 

c 

c 

J 

L      J 

(2.4.1) 


where  the  square  brackets  indicate  integer  arithmetic  and  [  J-,  or  [  Jp 
are  used  if  rectangular  partitions  or  polygonal  partitions  are 
desired,  respectively  [3]-   For  the  same  element,  a,  of  the  array  A 

then  the  PE  number  m(a)  is  found  by 


m 


(a)  =  (i'  I  •  c  +  i,  )  mod  n. 


(2.4.2) 


Thus  the  two  quantities  M(a)  and  m(a)  completely  define  the  storage 
assignment  of  array  element  a  in  ILLIAC  IV. 
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3-      C-DISTMCE  SELECTION 

3.1  Array  Covering  Goals 

Recall  that  array  A  is  subdivided  into  r  x  c  partitions 
where  re  <  n,  a  given  integer.   The  union  of  the  partitions  consti- 
tutes a  covering  of  A  where  the  covering  is  a  h  x  w  array  A'  of 
partitions  with  rh  >  p  and  cw  >  q  in  which  each  element  of  A' 
represents  one  partition  of  A.   In  section  3-2  a  homomorphic  mapping 
from  A  into  A'  is  described.  While  the  parameters  p,  q_^  and  n  are 
fixed,  the  parameters  r,   c,   h,   and  w  are  variable,  within  the  con- 
straints.  Hence  the  array  A'  is  a  variable  whose  parameters  can  be 
adjusted  to  fit  ILLIAC  IV  memory.  Whenever  the  quantity  hw  is 
minimized  the  memory  usage  efficiency  of  the  covering  is  maximal. 

A  less  important  consideration  than  mapping  efficiency  is 
indexing  ease  to  fetch  and  store  the  linear  structures.   This  goal  is 
easily  achieved  by  computer  systems  which  have  several  index  registers 
and  can  perform  modulo  arithmetic;  on  computer  system^  having  only  one 
index  register  and  limited  word  masking  capability  it  can  be  done,  but 
awkwardly.  The  ILLIAC  IV  computer  system  not  only  has  fovoc   global 
index  registers  but  also  each  EE  has  two  local  index  registers. 

3.2  System  Constraints 

The  number  of  r  x  c  partitions,  each  containing  n  elements, 
is  given  by  the  quantity  hw,  which  is  to  be  minimized.   It  is  easily 
seen  that  the  greatest  lower  bound  g  for  hw  taking  the  array  elements 
n  at  a  time  is  given  by 


m   +  (n-l) 
n 
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(3.2.1) 


where  the  square  hrackets  indicate  that  integer  arithmetic  is  used. 
The  memory  mapping  efficiency  of  the  array  A  then  is 


hw 


While  it  is  of  primary  importance  to  maximize  f  it  may  "be  assumed  that 
a  partitioning  scheme,  and  hence  an  array  covering,  vhich  minimizes 
the  excess  covering  on  rows  alone  is  a  step  in  the  right  direction. 
That  is,  minimize  wc  >  q. 


Let  c .  be  an  estimate  for  the  value  of  the  positive  integer 


Then 


w.  = 

1 


q  ^  (c.-l) 


c. 


(3.2.2) 


is  the  minimum  value  of  w  such  that  wc.  >  q.   But  given  this  value  w. 

1  —  ^       °  1 

the  quantity  wc  is  minimized  only  when  the  new  value  of  c, 


c .  = 

1 


q  +  (w^-1) 


w. 

1 


(3-2.3) 


is  used.  For  no  other  values  of  c  and  w  such  that  c.  <  c  <  c.  and 

w.  >  w  is  wc  minimized. 

1 

If  the  partitions  are  polygons  then  r  is  a  rational  number 
and  r  =  n/c.   Recall  that  hr  >  p  is  required.   Then  it  is  required 
that  h  >  pc/n,  and  hence 


h.  = 

1 


pc^  f  (n-l) 


n 


(3.2.1^) 
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is  the  minimum  value  of  h  to  meet  the  constraint.   Thus,  for  each  6".  , 

'  1 

a  minimized  quantity  h."w^  is  determined. 
Define 


c.  ,  =  c.  -  1  ,  (3.2.5) 

1+1    1     '  \~>       >  I 

where  c  =  n  and  let  a  set  of  solutions  to  equations  (3.2.2)  through 
(3.2.5)  "be  S  =  {c.|c.  >13'   Clearly  S  is  finite  and  contains  no  more 
than  n  elements.   Let 


T  =  {t. It.  =  h.w.  for  all  c.  g  S)  . 

s    '  1 1  1    11         1 

Then  T  is  a  set  of  memory  mapping  measures  t.  of  the  elements  in  S. 

On  ILLIAC  IV^  other  considerations  in  storage  selection 

include  indexing  and  data  routing  "between  PE's.   Since  the  storage 

scheme  described  above  results  in  rows  of  the  integers  m(a. .)  consist- 

ing  of  cyclic  groups  G  generated  by  1  and  columns  of  m(a. .)  consisting 

-■-J 

of  cyclic  groups  H  generated  by  c",  then  on  IlilAC  IV  the  inter-EE 
route  distance  of  adjacent  array  elements  in  a  row  is  1  and  the  route 
distance  between  adjacent  elements  in  a  column  is 

r(c)  =  -TT-  +  min  (z  mod  8,  9  -  z  mod  8) 

where  z  =  min  (c,  n-c).   Let 

U^  =  {u. |u.  =  R(c.),  c.  €  S)  . 

Then  U^  is  a  set  of  data  routing  measures  u.  of  the  elements  in  S. 
s  L 

Finally  the  order  d  of  H  plays  a  role  in  STRIPSKEW  storage  if  column 
access  is  required  in  the  calculation  algorithm;  let 
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V  =  {v. Iv.  =  D(c.,n),  c.  €  S]  . 

Then  V  is  a  set  of  the  n/d.  ratio  measures  v.  of  the  elements  in  S. 

s  '  1  1 

Assimie  that  these  three  measures  are  the  only  items  affect- 
ing the  optimality  of  memory  storage  selection.   Then  there  is  a 
three  dimensional  sample  space  X  such  that  if  x.  e  X  then 

X.  =  (t. ,  u. ,  V. ) . 

If  there  is  some  way  of  weighting  the  relative  importance  of  these 
measures^  then  a  discriminant  function  may  be  determined. 

A  reasonable  approach  is  to  relate  the  measures  in  accor- 
dance with  resultant  execution  time  on  ILLIAC  IV.   Suppose  that  the 
basic  execution  algorithm  requires  b,  time  to  compute  on  ILLIAC  IV. 
Let  bp  be  the  additional  time  required  for  all  coliamnar  data  route 
distances  greater  than  one  and  let  b_  be  the  additional  time  required 
to  access  n  elements  of  a  column  simultaneously,  i.e.  whenever  the 
ratio  of  n/d  is  greater  than  one.   Then  the  function 

t(u.,v.)  =  \   +  b^Cu.  -  1)  f  b^Cv.  -  1) 

is  clearly  a  linear  discriminant  function  and  describes  the  time 
required  to  perform  the  execution  algorithm  one  time  on  ILLIAC  IV  [4]  • 
However J  the  algorithm  is  performed  t.  times.  Hence  the  discriminant 
function  is 


;(x. )  =  (b^  f  b^u.  +  t^v.  )t.  (3.2.6) 
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where  b,  =' "G",  -  Td^  -  "b  .  Selecting  the  minimum  g(x.)  for  all  x.  €  X 
yields  the  most  efficient  storage  method  and  skew  distance  c  for  the 
given  array.  The  total  storage  allocation  efficiency^  F,  is  found  by 

min  g(x^)  * 

Equations  (3«2.l)  through  (3. 2. 6)  are  implemented  as  an 
ALGOL  [5]  procedure;,  called  PARTITION,  in  the  appendix.   The  output  of 
the  procedure  is  printed  and  consists  of  the  set  of  solutions  S  along 
with  partition  size,  dimensions  of  A',  resulting  maximum  array  dimen- 
sions with  polygonal  and  rectangular  partitioning,  data  route 
distances,  total  number  of  partitions  required,  and  discriminant  for 
each  c.  e  S.   The  procedure  also  determines  the  skew  distance  c  which 
results  in  optimum  computer  system  usage. 
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h.      COMPOSED  FIGURE  COVERING 

^.1  Composition  Description 

The  storage  allocation  algorithm  described  in  chapters  2  and 
3  is  a  reliable  tool  for  finding  the  most  efficient  partitioning 
scheme  of  a  rectangular  area  representing  a  p  X  q  array  A.   This  chap- 
ter indicates  how  that  algorithm  may  be  used  to  find  the  best 
partitioning  scheme  resulting  in  storage  allocation  for  an  arbitrarily 
shaped  two-dimensional  region  composed  of  rectangles. 

Let 

(l)  R  =  {A.  |i  =  1,  2,    . . . ,   m,  A.  of  dimension  p.  X  q.. } 

be  the  set  of  all  possible  rectangles  which  constitute  convex  sub- 
regions  of  a  given  two-dimensional  region  B^  which  has  straight  edges; 


(2)  C.  =  A.   U  A.   U  ...  U  A. 

^    ^1    ^2 

be  one  of  the  possible  coverings  where  A.   fl  A   =  $  for  1  <  j^  k  <  s 

and  j  jt  k,  A.   e  R;  and 
0 

(3)  r=  {C.  |all  possible  C.  such  that  Be  C.}* 

Then  r  is  the  class  of  all  coverings  of  the  region  B.   The  objective 

is  to  find  that  C.  e  F  which  resiilts  in  the  minimum  value  of  the 

aggregate  discriminant  function 

s 
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where  g,  (x-  )  is  determined  for  each  skew  distance  c,  for  each  array 

A-  .   The  minim.'uni  g,  (X. )  for  all  coverings  and  all  skew  distances 
1^  k  1 

yields  the  best  memory  storage  allocation  for  the  region  B. 


^.2  Directed- graph  Production  Procedures 

The  class  of  all  coverings  of  a  given  region  B  is  isomorphic 
to  a  directed- graph  where  each  C.  e   P  is  represented  by  a  unique  path 
through  the  graph.   Construct  the  graph  in  accordance  with  the 
following  rules: 

(1)  Extend  all  edges  of  the  region  B  to  plus  and  minus 
infinity,  and  ignore  all  extension  parts  which  do  not  pass  through  B. 
The  edge  extensions  result  in  simple  convex  polygons  which  are  rectan- 
gles; label  these  in  some  unique  way.   The  edge  extensions  exist  in 
pairs  {£,Jl^)   emanating  from  a  vertex  in  each  concave  subregion  of  B. 

(2)  Identify  each  simple  terminal  polygon  (i.e.,  with  only 
one  edge  facing  the  remainder  of  the  region)  and  select  one  as  a 
starting  point  or  reference.   Draw  all  possible  paths  connecting  it 
with  each  of  the  remaining  terminals  where  each  path  may  pass  through 
each  simple  polygon  one  and  only  one  time;  wherever  it  is  possible 

to  reach  two  separate  terminals  unambiguously  from  a  given  rectangle 
the  path  may  also  divide. 

(3)  For  each  path  sequentially  number  the  edge  pairs 
{£•?&'.)    so  that  £.    is  the  member  closer  to  the  reference  polygon  and 
£.    is  the  member  farther  from  the  reference.  Figure  7  describes  the 
edge  pair  labelling  process  discussed  here. 
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Figure  7.      Boimdary  Labelling  Example 
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Figure  8.   Simple  Di-graph  Section 
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{k)     Whenever  a  path  crosses  each  mernber  of  the  edge  pairs 
{£.,£1)   and  {£ ,      ,£\   ,)   one  and  only  one  time  then  the  di-graph 

XI  1 +J.   X  •t"X 

section  given  in  Figure  8  may  be  used  to  describe  the  polygonal 
connectivity  between  the  foiir  members^  where  E.  is  the  convex  com- 
posed polygon  formed  between  £.    and  £.    -,  ^  F.  is  the  convex  composed 
polygon  formed  between  £\    and  £\      ,   and  their  intersection,  union, 
and  difference  are  also  rectangular  convex  polygons. 

(5)  Whenever  a  path  separates  or  divides,  as  in  a  "T",  a 
possible  composition  is  described  in  Figure  9'   Note  that  any  single 
exit  from  the  node  labelled  i.  must  lead  to  both  simple  graphs  G.. 
and  Gp  and  that  both  exits  from  £\    node  must  be  taken.   The  polygon 
notation  E.  and  F.  has  the  same  meaning  as  in  step  {k)    and  the 
prescript  specifies  the  simple  graph  to  which  it  belongs.  Figure  9 
does  not  yield  all  possible  convex  compositions  of  polygons,  but  it 
provides  sufficient  interleaving  of  the  two  simple  graphs  for  the 
purposes  of  this  paper. 

(6)  If  the  given  region  results  in  a  path  through  the 
region  which  behaves  in  a  way  not  described  in  steps  {h)    and  (5), 
e.g.,  crossing  an  edge  pair  member  twice,  then  other  di-graph  sections 
similar  to  those  of  figures  8  and  9  can  be  specified.   However,  it  is 
beyond  the  scope  of  this  paper  to  do  a  complete  analysis  in  graph 
theory. 

(7)  For  each  path  through  B  construct  a  di-graph  by 
adjoining  identical  nodes  of  the  building  blocks  or  graph  sections 
developed  in  steps  {h) ,    (5)  and  (6).  Whenever  commonality  exists, 
merge  the  graphs  for  each  path  into  a  single  graph. 
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(8)  Chain  the  graph  terminal,  which  represents  the  distinct 
terminal  polygon  at  the  end  of  the  path  through  B,  to  the  graph 
terminal  of  all  sibling  graphs  (i.e.,  resulting  from  a  "T"  in  the 
path)  to  form  a  single  terminal  for  the  compoiind  graph. 

These  rules  may  he  obscure,  but  an  example  of  their  use  is 
indicated  in  figures  10,  11,  and  12.   In  this  case  there  is  a  single 
path  through  the  region  which  separates  to  the  two  end  terminals.   The 
di-graph  in  Figure  12  describes  the  class  r  of  all  possible  coverings 
of  the  region  B  such  that  each  path  through  the  di-graph  represents 
a  disjoint  covering  C.  of  the  region  (i.e.,  B^c).   Since  the 
di-graph  completely  defines  T,  it  may  be  analyzed  sys.tematically. 

^.3  Critical  Path  Analysis 

Let  a  section  of  the  given  region  B  be  denoted  by  A-    ,  a 

3 
component  of  the  covering  C,  and  let  A.   represent  an  array  of 

J 
dimension  p.   x  q.-  •   Application  of  the  partitioning  algorithm 

J     J 
described  in  section  3*2  on  the  array  A.   yields  a  set  of  skew 

distances  S.  .   For  each  c  £   S-   partition  dimensions,  storage 

efficiency  measures  and  discriminant  values  are  determined. 

The  goal,  of  course,  is  to  find  a  skew  distance  c  which  is 

common  to  all  components  of  a  covering  C.  (i.e.  c  e  S.  for  all  j  <  s) 

^  3 

and  which  has  an  aggregate  smaller  discriminant  value  (i.e.  min  g(X. ) 

for  all  i  with  C.  e  r)  than  any  other  skew  distance  in  any  other 

covering.  Utilization  of  the  graph  developed  in  section  k.2   makes 

this  a  straightforward  and  simple  task. 


Figure  10.      Drill-press  Cross   Section 
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Figure  12.   Covering  Di-graph  of  Figure  10 
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Each  path  of  the  graph  represents  a  component  A^     for  some 

C.  €  r.  Application  of  the  partitioning  algorithm  resiiLts  in  a 

solution  set  S.  for  each  such  A^   .     The  set  of  skew  distance  solu- 

j  3 

tions  S.  attendant  at  a  node  N.  is  the  set  for  which  there  exists  some 

path  in  the  di-graph  from  the  starting  node.  Therefore  each  node 

solution  set  S.  is  fo\ind  "by 

S.  =  \y    S^  ,  where 

J.   =  (i.|A-   constitutes  an  input  path  to  the  node  N^}. 

While  perfonning  the  set  union  operation,  compare  discriminant  values 

for  each  skew  distance  c,  £   S„;   discard  all  duplicate  c's  from  S. 

except  those  with  the  smallest  discriminant  value  g,  (x)  and  record  the 

index  of  the  path  S.   containing  this  c,  . 

3  ^ 

The  set  of  skew  distance  solutions  on  each  output  path  from 

node  N.  is  the  set  common  to  both  the  path  and  the  node  and  is  there- 
fore S^     ^  ^n   f°^  ^1  paths  i.  such  that  Aj_  constitutes  an  output 

3  3 

path  from  the  node  N..  While  performing  the  set  intersection  opera- 
tion, augment  the  accximulated  discriminant  value  g,  (X. )  of  each 

K   1 

c,  G  S.  with  the  value  g  (x^  )  for  the  c  £   S-  .   Continue  these 
K  l  ^3  ^3 

operations  sequentially  until  all  terminals  are  reached  in  each  simple 
graph;  then  perform  the  described  set  intersection  for  each  chain  in 
the  graph. 

When  the  ultimate  set  S  is  determined,  the  skew  distance 
having  the  smallest  aggregate  discriminant  value  (i.e.  min  g,  (X. ) 
for  all  k)  is  the  best  possible  partitioning  scheme  for  storage 
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allocation.  The  recorded  path  indices  for  that  skew  distance  define 
the  path  i  through  the  di-graph  and  therefore  the  composition  of  the 
best  covering  of  B. 

A  complete  critical  path  analysis  of  the  region  given  in 
Figure  10  is  heyond  the  scope  of  this  paper;  however  such  an  analysis 
was  performed  and  the  results  are  summarized  here.  Dimensions  and 
compositional  meshes  were  assigned  to  the  region  so  that  the  mesh 
density  would  "be  uniform  throughout.   The  partition  size  was  selected 
at  n  =  6k.      The  analysis  indicates  that  the  optimum  partitioning 
results  when  a  skew  distance  of  c  =  25  is  selected  and  that  the  hest 
compositional  covering  is 

C.  =  AqA^  U  AgB^  U  B^C^  U  Cg  U  D^D^E^  U  E^  U  F^. 

The  aggregate  discriminant  value  indicates  that  the  efficiency  is  on 
the  order  of  90  percent. 

k.h     Compositional  Pragmatics 

There  has  been  no  discussion  about  the  pragmatics  of  compo- 
sitionally  adjoining  distinct  arrays  for  which  partitioning  schemes 
are  separately  derived.   Specifically,  the  problems  of  row  and  column 
alignment  have  thus  far  been  ignored,  as  have  horizontal  and  vertical 
strip  alignment.   These  problems,  however,  are  only  apparent  and  not 
real. 

Since  all  rows  of  the  memory  assignment  integers  m(a) 
consist  of  cyclic  groups  G  generated  by  1,  it  is  simply  necessary  to 
observe  where  the  rows  are  truncated  in  one  array  and  begin  the  row 
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assignments  in  the  adjoining  array  with  the  next  value  in  the  cycle. 

Furthermore,  since  all  arrays  in  the  composition  are  partitioned  using 

the  same  skew  distance  c,   then  all  column  integers  m(a)  consist  of  the 

cyclic  group  H  or  a  coset  generated  by  "c.   An  additive  constant 

included  in  the  memory  assignment  equations  of  chapter  2  compensates 

for  this  consideration.   Alignment  of  horizontal  or  vertical  strips 

between  adjoining  arrays  should  not  he  requested  since  such  a 

constraint  would  seriously  reduce  the  storage  efficiency  of  the 

composite  covering. 

There  is,  however,  one  other  consideration  which,  if 

implemented,  woTold  increase  the  storage  efficiency.  Recall-  that  the 

partitioning  and  storage  scheme  of  an  array  Aj_  results  in  a  covering 

J 
array  A?  where  A.  c  A.'   and  A.'   is  of  dimension  (hr).  x  (wc).   such 
J        ^j  -  ^j      ^j  ^j      ^j 

that  (hr)j_  >  p.   and  (wc).   >  q.  .   Since  the  critical  path  analysis 

is  performed  in  a  sequential  fashion,  then  the  reduced  arrays 

,  B.   =  A.   -  A.    for  each  skew  c,  e  S.   could  be  used  instead  of 
k  1 .    1 .    1 .  ,  k    1 . 

.3     J     J-1  J 

A-j^  .   Since  many  Bj^  arrays  result  from  each  A.   array,  it  is 
estimated  that  the  analysis  task  is  increased  by  approximately  20 
percent.  Furthermore,  this  analysis  is  not  invariant  under  selection 
of  the  reference  polygon  for  the  di-graph  constinction.   Consequently 
a  di-graph  must  be  constructed  for  each  terminal  in  the  region.   This, 
of  course,  increases  the  analysis  task  by  several  orders  of  magnitude. 
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5.   C-SKW  EXTENSIONS 

^«1  Non-orthogonal  Arrays 

Although  the  word  "rectangular"  has  judiciously  teen  avoided 
in  the  discussions  of  arrays  and  their  partitions  in  chapters  2 
through  3^  the  idea  of  orthogonality  is  certainly  implied  "both  by  the 
figures  and  "by  such  concepts  as  "row  and  column.  "  It  is  not  a  neces- 
sary condition,  however,  that  "row"  and  "column"  he  orthogonal.   It  is 
the  purpose  of  this  chapter  to  describe  briefly  some  of  the  potential 
non-orthogonal  partitioning  schemes  and  to  explore  some  of  their 
possible  uses. 

The  first  alternative  which  comes  to  mind  is  that  partitions 
be  parallelograms  where  the  array  or  mesh  points  comprising  columns 
are  parallel  to  one  set  of  sides  and  the  points  comprising  rows  are 
parallel  to  the  other  direction.   Such  a  storage  scheme  is  represented 
in  Figure  13  by  a  sample  vertical  strip.   The  memory  assignment 
integers  m(a)  for  a  e  A  are  not  explicitly  stated  but  are  implied  by 
the  group  generators  1,  o. ,    and  cfl  for  rows,  columns,  and  diagonals 
respectively.   The  angle  Q   described  by  the  intersection  of  a  row  and 
column  becomes  an  additional  parameter  of  the  calculation  algorithm  on 
the  mesh  which  possibly  would  not  exist  explicitly  in  the  orthogonal 
case.   The  big  advantage  of  parallelograms,  however,  is  that  the 
concept  of  general  triangular  partitions  is  easily  introduced  by 
drawing  a  diagonal  of  the  parallelogram. 


1 
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Figure  I3.   C-SKE¥  Storage  Vertical  Strip 
with  Non-orthogonal  Partitions 


Not  all  parallelograms,  of  course,  can  be  bisected  such  that 
two  triangles  are  formed  with  unambiguous  partition  allocation  of  mesh 
points.   However,  when  it  is  possible  then  the  triangle  linear  struc- 
tures of  the  base  and  the  two  legs  are  generated  by  1,  c,  and  c+k 
(modulo  2n)  where  k  is  some  integer.   The  modulus  is  taken  at  2n  since 
it  is  desired  that  each  triangle  be  of  size  n.   Of  course,  a  given 
partition  in  the  computer  system  contains  elements  of  both  triangles, 
but  the  two  triangles  are  totally  represented  in  two  partitions. 
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On  ILLIAC  IV,  n  =  2-^  where  p  =  6,  7,  or  8.   It  appears  that 
if  k  =  1  then  only  when  p  is  odd  does  reasonable  efficiency  result. 
When  k  =  2  and  p  is  even  the  triangular  partitions  may  result  in 
exactly  n  mesh  points  and  therefore  highest  storage  efficiency.  A 
horizontal  strip  triangulated  with  k  =  2  is  presented  in  Figure  1^. 


Figure  ik.      Triangulated  Horizontal  Strip  with  Linear 
Structures  Generated  hy  T,  c,   c-f-2 


A   very  interesting  and  useful  consequence  of  a  strip  such 
as  that  presented  in  Figure  lU  results  when  the  hase-leg  incidence 
angle,  0  _,  of  the  upright  triangles  is  not  the  same  as  0p  of  the 
inverted  triangles.   Of  course,  the  two  triangle  types  cannot  be 
coplanar  and  still  be  connected  to  each  other.   If  9p  >  0.,  then  a 
cylinder  of  such  a  horizontal  strip  has  a  circumference  at  the  bottom 
which  is  larger  than  that  on  top.  A   whole  set  of  these  strips  could 
be  used  to  represent  a  hemisphere  in  the  same  way  that  a  geodesic  dome 
is  constructed.   If  it  is  desired  that  a  mesh  of  constant  density 
result  on  the  dome,  then  each  horizontal  strip  must  be  constructed 
anew  without  regard  to  mesh  or  partition  alignment  with  the  preceding 
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strip.   Interpolation  at  the  discontinuity  poses  no  problem  on 
ILLIAC  rv  since  the  rows  hounding  the  discontinuity  are  in  memory 
locations  of  group  G  generated  by  1  and  therefore  of  order  n. 

^.2  Multi-dimensional  Partitions 

The  concept  of  two-dimensional  partitions  is  used  throughout 
this  paper,  however  the  ideas  and  partitioning  methods  presented  here 
are  not  restricted  to  this  class.   Any  k-dimensional  partition,  where 
k  <  n,  is  acceptable.   The  generators  for  the  memory  assignment 
integers  m(a),  where  a  e  A,  an  array,  of  a  k-dimensional  partitioning 
scheme  are  found  by 


c.  -,  =  C.W.,  for  1  <  i  <  k  and 

1-1    1  i'  — 

where  w.  is  the  width  of  the  partition  in  the  i-th  dimension  and  the 
modulus  is  n.  The  partition  has  jagged  boundaries  in  the  first 
dimension  in  the  same  way  that  the  two-dimensional  polygonal  parti- 
tioning scheme  has  the  appearance  of  a  jig-saw  puzzle  between  rows. 
The  memory  assignment  integers  m(a)  form  additive  cyclic  groups  G. 
which  are  generated  by  c.  and  are  of  order  d.  for  each  dimension 
i  <  k.   These  ideas  are  thoroughly  explored  in  chapter  2  and  the 
comments  about  linear  structures  hold  for  k-dimensional  partitioning 
as  well.   As  an  illustrative  example  consider  Figure  15  where  k  =  3^ 
c_  =  1,  Cp  =  c,  c^  =  re,  and  Wp  =  r.  By  counting  dots  in  the  figure 
it  is  obvious  that  c  =  5^  r  =  3  and  n  =  6k.      Consequently  each  c.  is 
relatively  prime  to  n  so  that  d  =  dp  =  d,  =  n. 
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^ . 3  Conclusion 

At  this  point  there  should  he  no  question  about  the  feasi- 
bility of  making  fixed  size  partitions  provide  efficient  storage 
utilization  for  any  array  of  arbitrary  size  and  an  arbitrary  number  of 
dimensions.   The  partition  is  simply  shaped  to  fit  the  given  array 
instead  of  attempting  to  redefine  the  array  to  fit  the  partitions. 
Fiirtherraore  it  is  demonstrated  that  efficient  storage  allocation  is 
not  restricted  to  "nicely  shaped"  arrays  but  is  equally  suited  to  a 
mesh  that  can  be  described  systematically.   Triangulated  partitions, 
which  need  not  be  coplanar,  open  the  vista  of  mesh  storage  for  many 
topological  siirfaces.   Application  of  the  principles  of  the -storage 
optimization  algorithm  to  the  non-orthogonal  and  multi- dimensional 
partitioning  schemes  should  be  an  intuitive  step. 
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APEEKDIX 

ALGOL  IMHiEMEKTATION  OF  THE 
COVERING  OPTIMIZATION  ALGORITHM 

This  section  is  an  implementation,  in  ALGOL  [5]  of  the 
analysis  descrihed  in  chapter  3  ^y  "the  system  constraint  equations 
(3.2.1)  through  (3.2.6).   Also  included  is  a  sample  program  output 
of  the  solution  set,  S,  of  potential  skew  distance  candidates.   The 
output  program  demonstrates  the  results  of  array  and  partitioning 
specifications  of 

P  =  135, 

Q  =  81,  and 

N  =  6k, 

and  discriminant  function  weights  of 

Bl  =  29.75, 
B2  =  0.25,  and 
B3  =  6.0. 

The  example  executes  the  partitioning  algorithm  and  then 
repeats  the  procedure  with  the  role  of  row  and  column  interchanged. 
This  interchange  is  done  without  modification  of  the  discriminant 
function  weights  Bl,  B2,  and  B3  since,  for  simplicity,  it  is  assumed 
that  the  object  program  execution  is  symmetric  in  rows  and  columns. 
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This  paper  is  an  analytic  study  of  a  mesh  partitioning  method  for  storage 
allocation  when  a  computer  system,  such  as  ILLIAC  IV,  requires  that  partitions 
(or  pages)  be  of  fixed  size.   The  storage  allocation  method  (called  C-SKEW) 
resTilts  in  memory  assignment  indices  which  constitute  additive  cyclic  groups 
for  the  linear  structures  of  the  mesh.  A  storage  optimization  algorithm,  which 
does  not  significantly  contribute  complexity  to  the  intended  computation,  is 
presented.  The  algorithm  determines  the  partitioning  method  resiilting  in  a 
minimal  covering  for  a  simple  mesh  or  for  a  complex  mesh  consisting  of  a 
composition  of  simple  meshes  where  the  composition  itself  is  determined.   The 
paper  also  demonstrates  how  the  storage  method  principles  are  applied  to 
non-orthogonal  and  multi- dimensional  arrays. 
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